CS 506 HW1 Solution

Name:

1. Understandking K-means Clustering

(Please fill out the functions in k_means_clustering.py)

2. Working with the Algorithms

To determine the number of clusters, I decided to use the elbow method which I was familiar with using in previous classes. I used elbow_plot() to help with plotting the data and determined visually that k=3 would be most likely to be optimal. If I hadn't done this, I would've chosen 3 by default, then looked at the clusters visually with different Ks and seen which looked the most visually accurate.

2b List a few bullet points describing the pros and cons of the various clustering algorithms.

3 Data Visualization

3a Produce a Heatmap. Is this heatmap useful in order to draw conclusions about the expensiveness of areas within NYC? if not, why?

While it is mostly uniform, and doesn't present a great heatmap, we see a clear hotspot around Manhattan which is definitely expected, and if weren't the case would be a clear indicator of something wrong. But, in terms of looking around other neighborhoods, it is pretty hard to judge considering the heatmap is pretty uniform in color, although there are still some hotspots around the map.

3b Visualize the clusters by plotting the longitude / lattitude of every listings in a scatter plot

3c For every cluster, report the average price of the listings within this cluster

3d Bonus point (provide a plot on an actual NYC map)

3e Are the findings in agreement with what you have in mind about the cost of living for neighborhoods in NYC? If you are unfamiliar with NYC, you can consult the web.

Considering that the most dense area of expensive clusters is in Manhattan, it is definitely at least somewhat in agreement with my assumptions of the cost of living in NYC.

4. Image Manipulation

The final image for problem 4 was saved to "./p4_image_out.png"